Ship 8 Tranche 5a: high-bit 4:2:0 RGBA u8 SIMD by uqio · Pull Request #25 · Findit-AI/colconv

uqio · 2026-04-26T11:57:59Z

Summary

Adds u8 RGBA SIMD across all 5 backends for high-bit 4:2:0 YUV (yuv420p9/10/12/14/16, p010/p012/p016) and wires them into the 8 high-bit u8 RGBA dispatchers in `src/row/mod.rs`. Builds on the scalar prep + dispatcher signatures landed in PR #24. The companion u16 RGBA SIMD work is deferred to Tranche 5b.

Changes

5 SIMD backends — NEON / SSE4.1 / AVX2 / AVX-512 / wasm simd128 — each gain a const-generic `*_to_rgb_or_rgba_row<BITS, ALPHA>` template across 4 kernel families:
- planar BITS-generic: `yuv_420p_n_to_rgb_or_rgba_row<BITS={9,10,12,14}, ALPHA>`
- semi-planar BITS-generic: `p_n_to_rgb_or_rgba_row<BITS={10,12}, ALPHA>` (P016 has its own family)
- 16-bit planar: `yuv_420p16_to_rgb_or_rgba_row`
- 16-bit semi-planar: `p16_to_rgb_or_rgba_row`
Existing RGB and new RGBA wrappers are thin shims over the shared template. Only the store (`vst3q_u8` vs `vst4q_u8`, `write_rgb_` vs `write_rgba_`) and the scalar tail dispatch branch on `ALPHA`; per-pixel math is unchanged.
8 high-bit u8 RGBA dispatchers wired in `src/row/mod.rs` (`yuv420p9/10/12/14/16_to_rgba_row`, `p010/p012/p016_to_rgba_row`) — replace the prior `let _ = use_simd` stubs with the standard `cfg_select!` per-arch route block, mirroring the existing RGB dispatchers. `use_simd = false` still forces scalar.
Per-backend RGBA equivalence tests — ~30 new `#[test]` functions across the 5 backend test modules. Each new x86 test gates on `is_x86_feature_detected!` so the suite stays clean under sanitizer/Miri/non-feature-flagged CI runners.
Compile-time `const { assert!(BITS == ...) }` retained on every shared template (was already a Codex-flagged hardening from prior tranches).

Test plan

`cargo test --lib` on host (aarch64-darwin / NEON path): 485 pass, 0 fail
`cargo check --lib --target wasm32-unknown-unknown` clean
`cargo check --lib --target x86_64-unknown-freebsd` clean (incl. `--tests`)
`RUSTFLAGS="-Dwarnings" cargo clippy --lib --tests` clean
CI: ASAN sanitizer run on x86_64-linux (was failing before `is_x86_feature_detected!` guards were added; should now pass)
CI: Miri on x86_64-linux (was failing before guards; should now pass)
On-device equivalence run for AVX2 / AVX-512 / SSE4.1 hardware (deferred to CI)

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR adds SIMD-backed support for high-bit-depth 4:2:0 RGBA (u8 output) row conversion paths across multiple architectures and wires them into the public row dispatch layer.

Changes:

Add use_simd-controlled dispatcher routing for high-bit 4:2:0 RGBA u8 conversions (YUV420p 9/10/12/14, P010/P012, YUV420p16, P016).
Implement RGBA SIMD entrypoints by reusing existing RGB kernels via shared *_to_rgb_or_rgba_row implementations with an ALPHA const parameter.
Add scalar↔SIMD byte-equivalence tests for the new RGBA SIMD paths across backends.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/row/mod.rs	Wires high-bit 4:2:0 RGBA u8 dispatchers to per-arch SIMD implementations with scalar fallback.
src/row/arch/x86_sse41.rs	Adds SSE4.1 RGBA wrappers/shared kernels (ALPHA=true) for high-bit 4:2:0 and P010/P012/P016 families.
src/row/arch/x86_sse41/tests.rs	Adds SSE4.1 scalar equivalence tests for the new RGBA high-bit 4:2:0 paths.
src/row/arch/x86_avx2.rs	Adds AVX2 RGBA wrappers/shared kernels (ALPHA=true) for high-bit 4:2:0 and P010/P012/P016 families.
src/row/arch/x86_avx2/tests.rs	Adds AVX2 scalar equivalence tests for the new RGBA high-bit 4:2:0 paths.
src/row/arch/x86_avx512.rs	Adds AVX-512 RGBA wrappers/shared kernels (ALPHA=true) for high-bit 4:2:0 and P010/P012/P016 families.
src/row/arch/x86_avx512/tests.rs	Adds AVX-512 scalar equivalence tests for the new RGBA high-bit 4:2:0 paths.
src/row/arch/wasm_simd128.rs	Adds wasm simd128 RGBA wrappers/shared kernels (ALPHA=true) for high-bit 4:2:0 and P010/P012/P016 families.
src/row/arch/wasm_simd128/tests.rs	Adds wasm simd128 scalar equivalence tests for the new RGBA high-bit 4:2:0 paths.
src/row/arch/neon.rs	Adds NEON RGBA wrappers/shared kernels (ALPHA=true) for high-bit 4:2:0 and P010/P012/P016 families.
src/row/arch/neon/tests.rs	Adds NEON scalar equivalence tests for the new RGBA high-bit 4:2:0 paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-26T12:02:43Z

+fn check_planar_u8_sse41_rgba_equivalence_n<const BITS: u32>(
+  width: usize,
+  matrix: ColorMatrix,
+  full_range: bool,
+) {


The newly added RGBA equivalence helpers/tests call SSE4.1 intrinsics unconditionally (via unsafe { yuv_420p_n_to_rgba_row... }, etc.). Unlike the existing SSE4.1 tests earlier in this file, these helpers don’t gate execution with std::arch::is_x86_feature_detected!("sse4.1"), which can cause SIGILL on CPUs without SSE4.1 (and may also break under Miri if the detection would otherwise early-return). Add the same feature-detection guard (either in each helper or at the start of each #[test]).

Copilot · 2026-04-26T12:02:43Z

+  width: usize,
+  matrix: ColorMatrix,
+  full_range: bool,
+) {


The new AVX2 RGBA equivalence helpers/tests invoke AVX2 intrinsics unconditionally. Existing AVX2 tests in this file early-return when !std::arch::is_x86_feature_detected!("avx2"); these new ones should do the same to avoid illegal-instruction crashes on non-AVX2 hosts (and to match the established test pattern in this module).

Suggested change

) {

) {

if !std::arch::is_x86_feature_detected!("avx2") {

return;

}

Copilot · 2026-04-26T12:02:44Z

+  width: usize,
+  matrix: ColorMatrix,
+  full_range: bool,
+) {


The new AVX-512 RGBA equivalence helpers/tests call AVX-512BW intrinsics unconditionally. Other AVX-512 tests in this file are guarded by std::arch::is_x86_feature_detected!("avx512bw"); without the same guard these tests can SIGILL on CPUs lacking AVX-512BW. Add the feature-detection early-return (in helpers or per-test) consistent with the rest of the file.

Suggested change

) {

) {

if !std::arch::is_x86_feature_detected!("avx512bw") {

return;

}

uqio added 4 commits April 26, 2026 22:46

update

0cc022c

update

cfff0dc

update

6706275

update

88a0de8

al8n requested a review from Copilot April 26, 2026 11:58

Copilot started reviewing on behalf of al8n April 26, 2026 11:59 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

update

00c73d6

al8n changed the title ~~Feat/ship8 rgba high bit 420 u8 simd~~ Ship 8 Tranche 5a: high-bit 4:2:0 RGBA u8 SIMD Apr 26, 2026

uqio merged commit 10d3e17 into main Apr 26, 2026
43 checks passed

uqio deleted the feat/ship8-rgba-high-bit-420-u8-simd branch April 26, 2026 12:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ship 8 Tranche 5a: high-bit 4:2:0 RGBA u8 SIMD#25

Ship 8 Tranche 5a: high-bit 4:2:0 RGBA u8 SIMD#25
uqio merged 5 commits intomainfrom
feat/ship8-rgba-high-bit-420-u8-simd

uqio commented Apr 26, 2026 •

edited by al8n

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-) {
+) {
+  if !std::arch::is_x86_feature_detected!("avx2") {
+    return;
+  }

-) {
+) {
+  if !std::arch::is_x86_feature_detected!("avx512bw") {
+    return;
+  }

Conversation

uqio commented Apr 26, 2026 • edited by al8n Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

uqio commented Apr 26, 2026 •

edited by al8n

Loading